13 results found.
Language Type:
Trilingual
Languages:
Pushto Tagalog Turkish
Availability:
From Owner
License:
<Not Specified>
Size:
<Not Specified> Production Status:
Existing-used
Use:
Speech Recognition/Understanding
Paper:
N/A
Documentation:
<Not Specified>
Written
Treebank,
Language Type:
Monolingual
Languages:
Afrikaans Akkadian Amharic Ancient Greek Arabic Armenian Assyrian Bambara Basque Belarusian Bhojpuri Breton Bulgarian Buryat Cantonese Catalan Chinese Classical Chinese Coptic Croatian Czech Danish Dutch English Erzya Estonian Faroese Finnish French Galician German Gothic Greek Hebrew Hindi Hindi English Hungarian Indonesian Irish Italian Japanese Karelian Kazakh Komi Permyak Komi Zyrian Korean Kurmanji Latin Latvian Lithuanian Livvi Maltese Marathi Mbya Guarani Moksha Naija North Sami Norwegian Old Church Slavonic Old French Old Russian Persian Polish Portuguese Romanian Russian Sanskrit Scottish Gaelic Serbian Skolt Sami Slovak Slovenian Spanish Swedish Swedish Sign Language Swiss German Tagalog Tamil Telugu Thai Turkish Ukrainian Upper Sorbian Urdu Uyghur Vietnamese Warlpiri Welsh Wolof Yoruba
Availability:
Freely Available
License:
Various
Size:
25 million words Production Status:
Existing-updated
Use:
Parsing and Tagging
-
Paper title:Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Joakim Nivre | Universal Dependencies | /N |
Documentation:
https://universaldependencies.org
Speech
Phonetic Database,
Language Type:
Multilingual
Languages:
Amharic English French German Italian Japanese Javanese Kazakh Mandarin Russian Spanish Tagalog Turkish Vietnamese
Availability:
Freely Available
License:
MIT
Size:
714 entries Production Status:
Newly created-in progress
Use:
Speech Recognition/Understanding
-
Paper title:AlloVera: A Multilingual Allophone Database
-
Paper track:Speech/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | David R. Mortensen | AlloVera | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Arabic Bengali Central Khmer Chinese Dari Egyptian Arabic English Georgian Hindi Iranian Persian Italian Japanese Korean Lao Mandarin Chinese Min Nan Chinese Moroccan Arabic Northern Khmer Panjabi Persian Russian Spanish Tagalog Thai Tigrinya Urdu Uzbek Vietnamese Wu Chinese Yue Chinese
Availability:
From Data Center(s)
License:
LDC
Size:
None Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:End-to-End Neural Speaker Diarization with Permutation-Free Objectives
-
Paper track:4.5 Speaker diarization/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yusuke Fujita | 2008 NIST Speaker Recognition Evaluation | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Multilingual
Languages:
Bulgarian Pashto Tagalog
Availability:
From Owner
License:
Size:
None Production Status:
Newly created-in progress
Use:
Speech Recognition/Understanding
-
Paper title:Leveraging non-target language resources to improve ASR performance in a target language
-
Paper track:8.6 Neural network training methods (including new/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Jayadev Billa | IARPA MATERIAL program BUILD/ANALYSIS set | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Multilingual
Languages:
Bulgarian Pashto Tagalog
Availability:
Freely Available
License:
Size:
None Production Status:
Not Applicable
Use:
Speech Recognition/Understanding
-
Paper title:Leveraging non-target language resources to improve ASR performance in a target language
-
Paper track:8.6 Neural network training methods (including new/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Jayadev Billa | YouTube Audio | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Multilingual
Languages:
Amharic Javanese Kazakh Tagalog Turkish Vietnamese
Availability:
License:
Size:
300 hoursProduction Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:Differentiable Allophone Graphs for Language Universal Speech Recognition
-
Paper track:9.8 Cross-lingual and multilingual components for /Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Brian Yan | IARPA Babel Language Pack | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Multilingual
Languages:
Arabic Bengali Dari Egyptian Arabic English Georgian Hindi Iranian Persian Italian Japanese Khmer Korean Lao Mandarin Chinese Min Nan Chinese Moroccan Arabic Panjabi Persian Russian Spanish Tagalog Thai Tigrinya Urdu
Availability:
From Owner
License:
LDC
Size:
640 hoursProduction Status:
Existing-used
Use:
Language Identification
-
Paper title:Modeling and training strategies for language recognition systems
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2008 NIST Speaker Recognition Evaluation | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Arabic Bengali Dari Egyptian Arabic English Georgian Hindi Iranian Persian Italian Japanese Khmer Korean Lao Mandarin Chinese Min Nan Chinese Moroccan Arabic Panjabi Persian Russian Spanish Tagalog Thai Tigrinya Urdu
Availability:
From Owner
License:
LDC
Size:
950 hoursProduction Status:
Existing-updated
Use:
Language Identification
-
Paper title:Modeling and training strategies for language recognition systems
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2008 NIST Speaker Recognition Evaluation Training Set Part 2 | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Congo Swahili Somali Tagalog
Availability:
From Owner
License:
Not released to the public yet, probably LDC in the future
Size:
360 hours Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:Untranscribed web audio for low resource speech recognition
-
Paper track:8.8 Acoustic model adaptation (e.g. bandwidth, emo/Poster Presentation
-
Paper status:Accept - Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Andrea Carmantini | IARPA MATERIAL language packs | /N |
Documentation:
English documentation, distributed with the corpus




